This week we are beginning to work with geospatial data in the R
environment. As discussed in class, geospatial data are characterized by
geometry that typically corresponds to actual locations in the real
world. Maps are a two-dimensional abstraction of the three-dimensional
space we all inhabit. The good news is that we can create maps using the
ggplot2 package in much the same way that we have used it to
generate barcharts, boxplots, and scatterplots… 😉
The general structure will remain the same and the graphics are
constructed in an additive way. The ggplot functions opens
or initializes a new plot while the aes of aesthetic
mappings function outlines which attributes or columns of the
data will be represented by different elements of the
plot we are creating. The aes function is often
nested within the ggplot function but it may also appear in
the geom_* function. Recall that in ggplot2
graphics are constructed in an additive way and the components are
linked together using the + operator. If there is no
geom_* function, then you will not see anything in the
output. The geom_* determines the type of graphic we are
creating (e.g., geom_point, geom_bar, geom_sf, etc.).
Take a moment now to refresh you memory by inspecting the help
documentation page for the ggplot, aes, and
geom_sf functions, paying close attention the list of
Arguments and the Examples for each
one.
Recall that scale_*_* functions are used to modify
and/or control way that the attributes in the aesthetic mappings from
the aes function are translated to visual properties of the
graphic being created. These will be specific to the component
of the graphic and to the nature of the data
attribute we are displaying. For example, if we are displaying
mean recorded NO_2 values at the neighborhood scale using the fill
argument of the aes function, we would use
scale_fill_continuous or perhaps
scale_fill_gradient or scale_fill_viridis
because we want to target the attribute we have mapped to the shading
(or fill) of the neighborhood polygons AND because the
attribute from our data is continuous rather than discrete
(categorical). This means that scale_fill_brewer and
scale_fill_discrete are likely to throw an error.
Similarly, using scale_colour_continuous or
scale_size_continuous will not work properly because we
want to control the shading (i.e., fill) of the polygons rather than the
color of the polygon boundary or the size of points that may be
displayed in our graphic.
Just to round out this discussion, the
scale_fill_gradient function creates a two color gradient
(i.e., from low to high), the scale_fill_gradient2 function
creates a diverging color gradient (i.e., from low to mid to high), and
the scale_fill_gradientn function creates a n-color
gradient. This page has more information on colors using the viridis
package and this page focuses on the RColorBrewer
package.
Enough already… 😫 Let’s make some maps using data from the Charlottesville Open Data Portal!
neighborhoods_0 <- st_read("https://opendata.arcgis.com/datasets/5fb42d80700b4bcd97990db16f2cc1c0_97.geojson")
## Reading layer `Assessment_Neighborhoods' from data source
## `https://opendata.arcgis.com/datasets/5fb42d80700b4bcd97990db16f2cc1c0_97.geojson'
## using driver `GeoJSON'
## Simple feature collection with 68 features and 6 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -78.52375 ymin: 38.00959 xmax: -78.44651 ymax: 38.07049
## Geodetic CRS: WGS 84
st_crs(neighborhoods_0)
## Coordinate Reference System:
## User input: WGS 84
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]]
a <- ggplot() +
geom_sf(data = neighborhoods_0, aes(fill = NeighHood), show.legend = FALSE) +
theme_void() +
scale_fill_viridis(discrete = TRUE) +
labs(fill = "Cville Neighborhoods") +
coord_sf(crs = "ESRI:54004") +
ggtitle("Mercator", subtitle = "Shape and Direction Preserving Projection")
b <- ggplot() +
geom_sf(data = neighborhoods_0, aes(fill = NeighHood), show.legend = FALSE) +
theme_void() +
scale_fill_viridis(discrete = TRUE) +
labs(fill = "Cville Neighborhoods") +
coord_sf(crs = "ESRI:54009") +
ggtitle("Mollweide", subtitle="Equal Area Projection")
c <- ggplot() +
geom_sf(data = neighborhoods_0, aes(fill = NeighHood), show.legend = FALSE) +
theme_void() +
scale_fill_viridis(discrete = TRUE) +
labs(fill = "Cville Neighborhoods") +
coord_sf(crs = "ESRI:102003") +
ggtitle("Albers", subtitle = "Equal Area Projection")
d <- ggplot() +
geom_sf(data = neighborhoods_0, aes(fill = NeighHood), show.legend = FALSE) +
theme_void() +
scale_fill_viridis(discrete = TRUE) +
labs(fill = "Cville Neighborhoods") +
coord_sf(crs = "ESRI:54032") +
ggtitle("Azimuthal Equidistant", subtitle = "Distance Preserving Projection")
e <- ggplot() +
geom_sf(data = neighborhoods_0, aes(fill = NeighHood), show.legend = FALSE) +
theme_void() +
scale_fill_viridis(discrete = TRUE) +
labs(fill = "Cville Neighborhoods") +
coord_sf(crs = "ESRI:102747") +
ggtitle("Virginia State Plane South NAD83 Feet", subtitle = "Shape and Distance Preserving Projection")
f <- ggplot() +
geom_sf(data = neighborhoods_0, aes(fill = NeighHood), show.legend = FALSE) +
theme_void() +
scale_fill_viridis(discrete = TRUE) +
labs(fill = "Cville Neighborhoods") +
coord_sf(crs = "EPSG:32617") +
ggtitle("UTM Zone 17 North WGS84", subtitle = "Shape Preserving Projection")
ggarrange(a, b, c, d, e, f, nrow = 2, ncol = 3)
The screen captures below show how we located the
URL used in the code chunk above to access the neighborhoods data
layer using the st_read function.
In the code chunk, we use st_crs to display the
coordinate system of the data layer we downloaded and it is the standard
Web Mercator coordinate system that many data portals use. this is an a
geographic coordinate system, which means its unit are degrees and that
those units are not uniform at all locations on the Earth. Typically we
want to perform calculations with geospatial data that have had a map
projection applied. The ggpubr::ggarrange function is
used to display several selected plots using different map projections
in a paneled layout. Note: the coord_sf
function allows us to display the data layer in a different coordinate
system without having to change the actual data object
itself. Recall that the st_transform function allows us to
actually change the coordinate system of the data
object itself.
# Get the parks data layer for Charlottesville...
parks_0 <- st_read("https://opendata.arcgis.com/datasets/a13bdf43fff04168b724a64f7dca234d_19.geojson")
## Reading layer `Park_Area' from data source
## `https://opendata.arcgis.com/datasets/a13bdf43fff04168b724a64f7dca234d_19.geojson'
## using driver `GeoJSON'
## Simple feature collection with 64 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -78.58927 ymin: 38.00961 xmax: -78.4435 ymax: 38.10078
## Geodetic CRS: WGS 84
st_crs(parks_0)
## Coordinate Reference System:
## User input: WGS 84
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]]
# Municipal boundary too...
city_limits_0 <- st_read("https://opendata.arcgis.com/datasets/43253262b4da436bbac25d8cdb2f043b_44.geojson")
## Reading layer `Municipal_Boundary_Area' from data source
## `https://opendata.arcgis.com/datasets/43253262b4da436bbac25d8cdb2f043b_44.geojson'
## using driver `GeoJSON'
## Simple feature collection with 1 feature and 1 field
## Geometry type: POLYGON
## Dimension: XYZ
## Bounding box: xmin: -78.52377 ymin: 38.00968 xmax: -78.44636 ymax: 38.07053
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
city_limits_stateplane_ft <- st_transform(city_limits_0, crs = "ESRI:102747")
# Change its coordinate system (i.e., project it)
parks_stateplane_ft <- st_transform(parks_0, crs = "ESRI:102747")
nhoods_stateplane_ft <- st_transform(neighborhoods_0, crs = st_crs(parks_stateplane_ft))
# Make a map with both layers...
ggplot() +
geom_sf(data = nhoods_stateplane_ft, aes(fill = "white")) +
theme_void() +
theme(legend.position = "left") +
geom_sf(data = parks_stateplane_ft, aes(fill = "green4"), alpha = 0.6) +
scale_fill_manual(name = "Legend",
values = c("green4", "white"), labels = c("Parks", "Neighborhoods"))
In the above code chunk, we download the Park
Area data layer and display it on the map along with the
neighborhood boundaries. Things get a little complicated because these
are both polygon layers and we want to shade them via the
fill argument. ggplot2 does not expect there
to be multiple layers using the same aesthetic mapping argument, so we
work around this by leveraging the scale_fill_manual
function which allows us to override the default legend that would have
been otherwise created 😁
Recall that if arguments are specified inside the
aes function a legend will be displayed by
default. Also recall that the alpha argument is how we
set the transparency of the geom_* we are mapping and the
values range from 0 (completely transparent) to 1 (completely solid).
Adding map elements like a north arrow and scale bar are more difficult
in R than in a dedicated GIS software package, but if you feel the need
to do so, here are a few options:
Explore these as your own time and interest dictate.
This exercise asks you to practice some of what we have learned about mapping geospatial data with ggplot2 and explores some of the geometric (topological) functionality available in the sf package. Using what the preceding portions of this R Notebook have demonstrated (i.e., read the Markdown sections of this document), please try your hand 😂 at the following:
st_read function to bring it down from the
interwebs 👍scale_fill_manual line of
code from the preceding chunkst_join function to create a new sf
object where the x argument is the projected parks
layer, the y argument is the projected city limits
layer, the join argument is
st_intersects and the left argument is
TRUEnhoods_stateplane_ft <- nhoods_stateplane_ft %>% mutate(NumStreetTrees = lengths(st_intersects(nhoods_stateplane_ft, street_trees_stateplane_ft)))
st_intersects and for
lengths)We have seen how to interact with vector data (e.g., points, lines, polygons) in the R environment from the in-class script we began during the last class session and we continued that exploration in the first portion of this Lab Assignment. Now, we will demonstrate how to interact with raster (e.g., grid cells, pixels) datasets, The sf package only handles vector data and the primary package for working with raster data in R is (appropriately) called raster 😆 At some point in the the future the terra package will succeed the raster package, but they are very similar in their form and function, as described in greater detail here.
Again, raster data differ
from vector data and the R functions that we use are (therefore)
different. For example, the st_crs function displays the
coordinate system of a vector sf object and for raster
objects, but if we want to change the coordinate system we use
raster::projectRaster as shown in the code chunk below. The
links below provide an overview of the functions available in the
raster and terra packages for handling raster
data:
In the code chunk below, we use raster::extract to
calculate the mean and standard deviation of grid cells that fall within
the boundaries of each neighborhood polygon. The data are from a local
heat mapping initiative where the Charlottesville Office of
Sustainability organized volunteers to collect temperature data from
bikes and automobiles on August 24, 2021. These data are now available
for download on the City’s
website and from the OSF repository. The details of the data
collection are available
in this report and the machine learning methods used to generate the
rasters are described in these publications:
Basically the raster layers represent estimates of air temperature across the city based on the data that these volunteers collected (after an interpolation process documented on the journal articles above).
# Create a unique identifier for use later...
nhoods_stateplane_ft <- nhoods_stateplane_ft %>%
mutate(ID = FID + 1)
# Read in and apply the map projection to the raster data layer...
pm_rast_0 <- raster::raster("./data/rasters_chw_charlottesville_010622/pm_t_f.tif")
pm_rast <- projectRaster(pm_rast_0, crs = "ESRI:102747")
# Use the extract function to calculate mean and standard deviation of grid cells (i.e. estimated temp)
# for each of the neighborhood polygons...
mean_temps <- extract(pm_rast, nhoods_stateplane_ft, fun = mean, na.rm = TRUE, df = TRUE)
colnames(mean_temps) <- c("ID", "MeanTemp")
std_dev_temps <- extract(pm_rast, nhoods_stateplane_ft, fun = sd, na.rm = TRUE, df = TRUE)
colnames(std_dev_temps) <- c("ID", "StdDevTemp")
# Combine these attributes, them link back to the sf object with the polygon geometry
# so we can map it...
nhoods_summ_stats <- inner_join(mean_temps, std_dev_temps, by = "ID")
nhoods_stateplane_ft_temps <- left_join(nhoods_stateplane_ft, nhoods_summ_stats, by = "ID")
# We could also have produced the same summary statistics using a for loop as given
# below, but the raster::extract function means less typing...
output <- tibble()
for (i in seq_along(nhoods_stateplane_ft$NeighHood)) {
this_nhood <- mask(pm_rast, nhoods_stateplane_ft[i,], fun = mean, na.rm = TRUE, df = TRUE)
the_nhood_name <- nhoods_stateplane_ft$NeighHood[i]
the_mean_temp <- mean(getValues(this_nhood), na.rm = TRUE)
the_sd_temp <- sd(getValues(this_nhood), na.rm = TRUE)
this_record <- cbind(i, the_nhood_name, the_mean_temp, the_sd_temp)
colnames(this_record) <- c("ID", "Neighborhood", "MeanTemp", "StdDevTemp")
output <- rbind(output, this_record)
}
write_csv(output, "./data/Mean_Std_Deviation_Temp_By_Nhood_Using_For_Loop.csv")
One of the less awesome 😐 things about ggplot2 is that if we want to map raster data, we must first convert it to a data frame object. The code chunk below demonstrates how we might accomplish this.
Ultimately, we want to be able to say something about exposure to heat as an increasingly important climate change impact. For example, this news story from last fall asks:
Which Charlottesville neighborhoods are the hottest?
The equity implications of heat exposure are receiving more attention, especially within urban planning. Increasing vegetated cover along with strategies like albedo modification are most often considered as tools for increasing heat resilience. Street trees can be part of the solution, but local governments (like the City of Charlottesville) can only plant trees on public property or in public right-of-ways. As a result, private property rights make urban greening efforts slower and more challenging to implement.
# Coerce the existing raster object to a data frame, while removing the NA values...
pm_rast_df <- pm_rast %>%
as.data.frame(xy = TRUE) %>%
drop_na(pm_t_f)
# Plot the estimated temperatures from the raster layer with the neighborhood
# boundaries overlaid...
capa_heat <- ggplot() +
geom_raster(data = pm_rast_df, aes(x = x, y = y, fill = pm_t_f)) +
scale_fill_gradientn(colors = brewer.pal(n = 8, name = "RdYlBu"), trans = "reverse") +
labs(fill = paste("Temp. ", intToUtf8(176), "F"),
title ="Late Afternoon Temperature - CAPA Heat Watch", subtitle = "24 August 2021") +
geom_sf(data = nhoods_stateplane_ft_temps, colour = "purple", fill = NA, alpha = 0.2, size = 1.25) +
theme_void()
capa_heat
ggsave("./plots/CAPA_Temperature_Estimates.png", units = "in", width = 16, height = 8)
# Create a histogram of estimated temperature values...
ggplot() +
geom_histogram(data = pm_rast_df, aes(x = pm_t_f), color="grey20", fill="dodgerblue", bins = 40) +
theme_minimal() +
labs(x = paste("Temp. ", intToUtf8(176), "F"),
y = "")
# Create a histogram of estimated temperature values with mean shown...
ggplot(data = pm_rast_df) +
geom_histogram( aes(x = pm_t_f), color="grey20", fill="dodgerblue", alpha = 0.4, bins = 40) +
geom_vline(aes(xintercept = mean(pm_t_f)), color = "firebrick", linetype="dashed", size = 2) +
theme_minimal() +
labs(x = str_c("Temp. ", intToUtf8(176), "F"),
y = "")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
This component of the exercise asks you visualize the temperature
data we have derived. Take a few moments to explore the documentation
for the ggarrange function in the ggpubr package,
then proceed with the questions below.
ggpubr::ggarrange to display these maps together,
then include and interpret this map in your submitted materials (i.e.,
comment on the spatial distribution of high and low values in each map
and what they might tell us about heat stress in our fair city 😉)Please submit an R notebook and knitted HTML file that shows your work and responses for each of the Exercises included in this lab exercise. Also, briefly comment on your experience with R during this lab exercise. Please upload your report to Collab by 5:00 pm on Friday March 17th.
This Lab Exercise will be graded on a 100-point scale according to the rubric below:
Length and formatting (10 pts)
Clarity of writing and attention to detail (20 pts)
Technical Content (45 pts)
Reflective Content (25 pts)